Lazy attribute selection: Choosing attributes at classification time
نویسندگان
چکیده
Attribute selection is a data preprocessing step which aims at identifying relevant attributes for the target machine learning task – namely classification in this paper. In this paper, we propose a new attribute selection strategy – based on a lazy learning approach – which postpones the identification of relevant attributes until an instance is submitted for classification. Our strategy relies on the hypothesis that taking into account the attribute values of an instance to be classified may contribute to identifying the best attributes for the correct classification of that particular instance. Experimental results using the k-NN and Naive Bayes classifiers, over 40 different data sets from the UCI Machine Learning Repository and five large data sets from the NIPS 2003 feature selection challenge, show the effectiveness of delaying attribute selection to classification time. The proposed lazy technique in most cases improves the accuracy of classification, when compared with the analogous attribute selection approach performed as a data preprocessing step. We also propose a metric to estimate when a specific data set can benefit from the lazy attribute selection approach.
منابع مشابه
Improving Lazy Attribute Selection
Attribute selection is a data preprocessing step which aims at identifying relevant attributes for a target data mining task – specifically in this article, the classification task. Previously, we have proposed a new attribute selection strategy – based on a lazy learning approach – which postpones the identification of relevant attributes until an instance is submitted for classification. Expe...
متن کاملLBR-Meta: An Efficient Algorithm for Lazy Bayesian Rules
LBR is a highly accurate classification algorithm, which lazily constructs a single Bayesian rule for each test instance at classification time. However, its computational complexity of attribute-value pair selection is quadratic to the number of attributes. This fact incurs high computational costs, especially for datasets of high dimensionality. To solve the problem, this paper proposes an ef...
متن کاملA Framework for Optimal Attribute Evaluation and Selection in Hesitant Fuzzy Environment Based on Enhanced Ordered Weighted Entropy Approach for Medical Dataset
Background: In this paper, a generic hesitant fuzzy set (HFS) model for clustering various ECG beats according to weights of attributes is proposed. A comprehensive review of the electrocardiogram signal classification and segmentation methodologies indicates that algorithms which are able to effectively handle the nonstationary and uncertainty of the signals should be used for ECG analysis. Ex...
متن کاملSupport Vector Machine Based Facies Classification Using Seismic Attributes in an Oil Field of Iran
Seismic facies analysis (SFA) aims to classify similar seismic traces based on amplitude, phase, frequency, and other seismic attributes. SFA has proven useful in interpreting seismic data, allowing significant information on subsurface geological structures to be extracted. While facies analysis has been widely investigated through unsupervised-classification-based studies, there are few cases...
متن کاملLearning Lazy Rules to Improvethe Performance of Classi ersKai
Based on an earlier study on lazy Bayesian rule learning, this paper introduces a general lazy learning framework, called LazyRule, that begins to learn a rule only when classifying a test case. The objective of the framework is to improve the performance of a base learning algorithm. It has the potential to be used for diierent types of base learning algorithms. LazyRule performs attribute eli...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Intell. Data Anal.
دوره 15 شماره
صفحات -
تاریخ انتشار 2011